Replace hardcoded init.krun with generic virtual file overlay#673
Replace hardcoded init.krun with generic virtual file overlay#673mtjhrc wants to merge 14 commits into
Conversation
|
No comments on the code but I really love the direction! |
95eac11 to
429bfe3
Compare
Move the init binary build script and include_bytes!() from the devices crate into a new init-blob crate. The passthrough modules reference the binary as init_blob::INIT_BINARY instead of using include_bytes! directly. Inspired by containers#593 by Geoffrey Goodman <geoff@goodman.dev>. Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
Replace the private next_inode AtomicU64 inside PassthroughFs with a shared InodeAllocator that is passed in at construction. This lets multiple layers (e.g. a future virtual-inode overlay) allocate from the same counter without implicit coordination via reserved ranges. The allocator starts at ROOT_ID + 2, reserving inode 2 for the existing init_inode in PassthroughFs. This reservation is removed in the next commit when init handling moves to AugmentFs. PassthroughFs::new() and PassthroughFsRo::new() now take an Arc<InodeAllocator> parameter. FsWorker::new() creates the allocator and passes it through. Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
Introduce AugmentFs<T>, a generic overlay that wraps any FileSystem implementation and intercepts FUSE operations for virtual inodes — synthetic read-only files and directories backed by static data. One-shot files can only be looked up once. Remove all init.krun special-case code (init_inode, init_handle, INIT_CSTR) from both the Linux and macOS passthrough implementations. The init.krun virtual file is now configured via VirtualDirEntry in the krun API layer and handled generically by the overlay. FsDeviceConfig carries a Vec<VirtualDirEntry> and FsWorker wraps AugmentFs<PassthroughFs> / AugmentFs<PassthroughFsRo>. The InodeAllocator now starts at ROOT_ID + 1 since the init_inode reservation is no longer needed. Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
Add API to prevent the default init binary (/init.krun) from being
injected into the root filesystem. Follows the existing
krun_disable_implicit_{console,vsock} pattern.
Must be called before krun_set_root().
Assisted-by: OpenCode:claude-opus-4.6
Signed-off-by: Matej Hrica <mhrica@redhat.com>
Add C APIs to inject virtual files and directories into a virtiofs device. Entries are backed entirely by host memory (no host file). Files support one-shot semantics (disappear after the first lookup). Paths may contain '/' to nest entries inside existing virtual directories (e.g. krun_fs_add_overlay_dir for "etc", then krun_fs_add_overlay_file for "etc/hostname"). Intermediate directories must already exist; -ENOENT / -ENOTDIR is returned otherwise. Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
Add API to retrieve the built-in default init binary. Callers that use krun_disable_implicit_init() can use this to obtain the init binary and inject it themselves via krun_fs_add_overlay_file(). Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
NullFs implements the FileSystem trait with just an empty root directory. It can be wrapped with AugmentFs to serve virtual files without any host directory involvement. Fs::new() now accepts Option<String> for shared_dir — None selects NullFs. FsDeviceConfig and FsServer gain the corresponding variants. Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
krun_set_root_disk_remount no longer creates a temporary empty host directory. Instead it configures a NullFs-backed virtiofs device (shared_dir: None) with init.krun overlaid via AugmentFs. Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
The temporary root directory hack is gone (replaced by NullFs), so the ioctl that cleaned it up and the config flag that gated it are no longer needed. Remove allow_root_dir_delete from FsDeviceConfig, Fs::new(), passthrough Config, and all call sites. Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
The exit-code ioctl is a krun mechanism, not a filesystem operation. Move it to the AugmentFs overlay where it is handled before any delegation to the inner filesystem. The Linux passthrough retains only EXPORT_FD (which needs access to passthrough-internal handle and export tables). The macOS passthrough no longer implements ioctl at all. Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
Boot a VM with a pure NullFs root — no host directory at all. Every file in the root (init.krun, guest-agent, .krun_config.json, test data) is injected as a virtual overlay, and /dev, /proc, /sys are virtual empty directories used as mount points. Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
Boot from an ext4 block device via krun_set_root_disk_remount. The virtiofs root uses NullFs with init.krun and virtual mount-point directories overlaid. The guest verifies it pivoted to the block device root successfully. Uses dlsym for krun_add_disk/krun_set_root_disk_remount so the test compiles without BLK and skips gracefully at runtime. Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
Build and test with the block device feature so the root-disk-remount test runs in CI. Install e2fsprogs (provides mke2fs) which the test needs to create the ext4 disk image. Assisted-by: OpenCode:claude-opus-4.6 Signed-off-by: Matej Hrica <mhrica@redhat.com>
|
As a thought, while you're implementing this, @mtjhrc, is there a reasonable opportunity to generalize this to supporting arbitrary virtual files (at least in a ro capacity)? I actually have a use-case where I want a |
Yes I thought about that, it would be pretty cool. But I would really want to first land some Rust API. And then we can just easily have a |
|
Actually, I think I changed my mind we can merge this before #670, but if we do it after it doesn't matter either way. |
I already did an initial rebase on top of your commits so let's just do this one first. |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a virtual inode overlay system (AugmentFs) for virtiofs, allowing the injection of synthetic, memory-backed files and directories into the guest filesystem. It includes a new NullFs implementation for virtual-only filesystems, an InodeAllocator for unique FUSE inode management, and updated API functions to support custom init binary injection and overlay file/directory creation. I have reviewed the changes and agree with the feedback regarding the unnecessary restriction on empty virtual files in krun_fs_add_overlay_file.
| if c_fs_tag.is_null() || c_path.is_null() || data.is_null() || data_len == 0 { | ||
| return -libc::EINVAL; | ||
| } |
There was a problem hiding this comment.
The check data_len == 0 prevents the creation of empty virtual files. This seems like an unnecessary restriction, as empty files are a valid use case (e.g., for lock files or placeholders). Consider removing this part of the condition to allow creating zero-length files. The data.is_null() check should be kept, as slice::from_raw_parts requires a non-null pointer even for zero-length slices.
if c_fs_tag.is_null() || c_path.is_null() || data.is_null() {
return -libc::EINVAL;
}
This PR replaces the hardcoded init.krun handling in the virtiofs passthrough backends with a generic virtual-files overlay (
AugmentFs).This introduces 2 new filesystem trait implementations:
AugmentFs<T>, a wrapper that intercepts FUSE operations for virtual inodes - synthetic read-only files/directories backed by static data. It also handles our custom ioctlsNullFs, a minimal FileSystem impl with just an empty root directory — used when no host directory is neededThe
init.krunis registered as just a virtual file from the API layer. As a bonus you can even inject the.krun_config.jsonas a virtual file.Reimplemented
krun_set_root_disk_remount()viaNullFs+AugmentFs#551 (comment)The public API is still mostly compatible. There are minor differences like
init.krundissapears after it has been looked up once.API breaking changes - applying
krun_disable_implicit_init()and otherdisable_implicit_*will be applied by default in a follow up PR.The init binary is now in its own init-blob crate. The direction for #634 (2.0 API) is to invert the dependency: init-blob would depend on libkrun's overlay APIs to inject itself, rather than libkrun depending on a specific init.
This supersedes #593 by @ggoodman, which tackled the same problem of decoupling init from the fs backends. This PR takes that idea further by removing awareness of init from the filesystem layer entirely - it's just another virtual file. #593 also introduced InitPolicy startup validation - how that fits into the 2.0 API (#634) with different payload types is still an open question.
Known limitations / future work: